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ABSTRACT 


The research on SARS-associated coronavirus (SARS-CoV) has not stopped since its discovery, but 
the pathogenesis of SARS is still unclear. To explore the possible molecular mechanisms of the in- 
vasion and virulence of SARS-CoV, we investigated the structural basis of the viral proteins using 
computational biology. Forty-five motifs relating to superantigens, toxins and other bioactive mol- 
ecules were detected in the proteins of SARS-CoV. The results showed that the distribution of the 
motifs varied in different proteins. Enzyme-like motifs were located in the R protein, while ICAM- 
1—like and toxin-like molecules were located in the spike, envelop, nucleocapsid, PUP1, PUP 2 and 
PUP 4 proteins. Comparison of SARS-CoV with other viruses (OC43, PEDV, HRSV, HHerpV and 
HAdenoV) showed that each group of motifs was different for each type of virus. Data suggest that 
the proteins of SARS-CoV with toxic motifs might play crucial roles in targeting host cells and in- 
terfering with the immune system. This study provides new information for drug and vaccine de- 
sign, as well as therapeutic strategies against SARS. 


INTRODUCTION 


TT" SEVERE ACUTE RESPIRATORY SYNDROME (SARS), 
with high rates of morbidity and mortality, has af- 
fected thousands of people and killed hundreds of them 
since November of 2002. The SARS-associated coron- 
avirus (SARS-CoV) was first identified as the pathogen 
of this disease in April 2003 (3,6). 

The most common symptoms of the disease were fever 
(>38.5°C), dry cough, myalgia, short breath, and dys- 
pnea. The progress of the illness was very fast. Conven- 
tional experimental tests show lymphopenia and slight 
leukopenia. Furthermore, traditional anti-bacteria treat- 
ment has little effect on this disease (15). 

Based on the chest radiographs and histopathological 


investigation, the pathological changes were character- 
ized by massive infiltration and marked alveolar edema 
with hemorrhage and hyaline membrane formation, even 
atrophy of lymph and widely angitis, but few inflamma- 
tory cells were observed. Furthermore, the serological ev- 
idence showed an increased level of IgG at the second 
week after onset of the symptoms (12,15). But the results 
of flow cytometry demonstrated both CD4* and CD8* 
T cells significantly decreased and would recover as soon 
as the symptoms disappeared (7). The above clinical ev- 
idence suggested that viral damage and allergic immune 
response could play a major role in the illness and may 
finally result in the Adult Respiratory Distress Syndrome 
(ARDS), which could be lethal to the patients. 

There is no doubt that the special proteins of SARS- 
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CoV determine its invasion and virulence against its host 
(especially human). In a previous study (13), we have iden- 
tified four major structural proteins, namely, the spike pro- 
tein (S protein), the envelop protein (E protein), the mem- 
brane protein (M protein), and the nucleocapsid protein (N 
protein), as well as five putative uncharacterized proteins 
(PUPs), which are all exogenous substances to the human 
body. Now, we investigate those motifs of SARS-CoV in 
regard to the immunity and toxicity in order to reveal the 
possible molecular mechanism of the disease. 


MATERIALS AND METHODS 


Source of sequences. The SARS-CoV BJO1 isolate 
was selected for the analysis, the genome of which had 
been sequenced in our previous study (13). The putative 
proteins encoded in the genome were analyzed. We also 
analyzed the proteins encoded in the genomes of the hu- 
man coronavirus OC43 (OC43, NC_005147), the porcine 
epidemic diarrhea virus (PEDV, NC_003436), the human 
respiratory syncytial virus (HRSV, NC_001781), the hu- 
man adenovirus A (HAdenoV, NC_001460) and the hu- 
man herpesvirus (HHerpV, NC_001806). At the same 
time, a local database was set up on an IBM p690 ma- 
chine, totally containing 62,228 known sequences of pro- 
tein molecules from the public database at NCBI, of 
which 38,010 sequences were associated with the anti- 
gens, 861 sequences with superantigens (sAgs), 4,773 se- 
quences with cytokines, and 18,584 sequences with tox- 
ins. Because the majority of these sequences were derived 
from experimental study on the functions of the protein, 
this database was used to detect the functional motifs by 
sequence alignment. 

Sequence analysis. The protein sequences of SARS- 
CoV were aligned using BLAST (ftp://ftp.ncbi.nih. 
gov/blast/) against the local database as described above. 
The blast results with identity value more than 30% were 
Statistically analyzed. Redundant data and those se- 
quences less than 50 amino acids showing less than 35% 
identity were removed. Selected segments were classi- 
fied according to the annotations. NetNGlyc1.0 software 
was used to detect glycosylation sites of the proteins 
(www.cbs.dtu.dk/services/NetNGlyc). The physical and 
chemical features of each peptide were examined by us- 
ing Compute pI/MW, ProtScale  (http://us.expasy. 
ch/tools) and Genhan (www.genhan.net/ peptide!.htm). 

Assess the conservation of the motifs. The conser- 
vation of the motifs which were less than 50 amino acids 
showing 35-39% identity, were compared and examined 
against other six types of coronaviruses, including the 
avian infectious bronchitis virus, the human coronavirus 
229K, the porcine epidemic diarrhea virus, the human 
coronavirus OC43 and the murine hepatitis virus. 
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RESULTS 


Distribution of the identified motifs in the SARS- 
CoV. Overall, 45 entries of sequences related to anti- 
genicity and toxicity were identified, by comparing pro- 
tein sequences of the SARS-CoV with the local database 
containing 62,228 known sequences. The average length 
of the identified motifs was 39.5 + 19.4 residue, and the 
average identity was 40%. 

The motifs which were less than 50 amino acids show- 
ing 35-39% identity, were examined for their conserva- 
tion in six coronaviruses. The data showed that all of 
them could be found in more than two other corona- 
viruses with identities from 43% to 75%. The average 
identity is 51.7%, in contrast to 43.9% of the globe align- 
ments against the whole amino acid sequences. It sug- 
gested that all the motifs have relatively higher conser- 
vation in the family of coronavirus. 

The motifs were distributed over nine ORFs of viral 
nonstructural and structural proteins, and the different 
coverage in each protein was shown in Table 1. In 
overview, the distribution was sparser in the long R pro- 
tein sequence than in others, but it was denser in the short 
PUP1, PUP4 and PUPS. 

The R protein of SARS-CoV is a type of polyprotein 
encoded by the biggest ORF accounting for almost two 
thirds of the whole viral genome. Of 15 identified mo- 
tifs in the R protein, six were associated with enzyme, 
and in agreement with their postulated functions, three 
were likely to show antigenicity, and the other three were 
associated with toxin in this biggest R protein (Fig. 1). 
Actually the R protein is not present in the virion, there- 
fore the possible antigenicity or virulence should be re- 
lated to post-translational modification, such as cleavage 
by one of the viral protease (16). In the leader protein, a 
subunit near the N-terminal of R protein, one motif was 
likely to be related to the plasminogen activation. 

Nine motifs localized in the S protein had a relatively 
even distribution along the entire amino acid sequence. 


TABLE 1. COVERAGE OF THE PREDICTED MOTIFS 
IN EACH PROTEIN OF SARS-CoV 


Protein Length (aa) Number Coverage (%) 
R protein 7073 15 8.91% 
S protein 1255 9 27.65% 
E protein 76 2 44.74% 
M protein 221 2 30.77% 
N protein 422 4 41.47% 
PUPI1 274 4 63.14% 
PUP2 154 2 31.17% 
PUP4 122 5 94.26% 
PUPS 98 2 77.55% 
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FIG. 1. 


Among these motifs, four were associated with neuro- 
toxins and bullous pemphigoid antigen, overlapping with 
the predicted glycosylation sites at codons 227, 330, 783, 
1116, and 1140. In the M protein, one motif possibly in- 
volved in sAg/toxin was located in the N-terminal exte- 
rior region, and another enzyme-like motif was located 
in the C-terminal interior region. Two motifs in the E 
protein were all involved in sAg/toxin. Two of the four 
motifs in the N protein were mutually clustered in the N- 
terminal region of 111 residues (45-152 a.a.), where a 
nuclear antigen sequence was identified. The fourth anti- 
gen-associated motif was located between residues 275- 
300 in the N protein. 

The identified motifs in the four PUPs had dense dis- 
tributions (Fig. 1). We noticed that the motifs at different 
places in the same proteins appeared to be related to the 
same toxic molecule, such as those in the PUP! (Table 2). 
In other words, it is possible that several motifs ina SARS- 
CoV protein might cooperate to implement one function. 

Classification of the motifs. 1. Motifs associated with 
sAg and/or toxin. These motifs in the SARS-CoV pro- 


The distribution of the predicted motifs in the SARS-CoV proteins compared to the local database. 


teins can be divided into two groups. One group was sim- 
ilar to those from heat-labile enterotoxin and staphylo- 
coccal enterotoxin, exfoliative toxin, botulinum neuro- 
toxin and bungarotoxin (a type of presynaptic 
neurotoxin), and cytotoxin 3 precursor; the other group 
was associated with proteases such as herpesvirus pro- 
tease, hydrolase, and some proteases in Escherichia coli 
O157:H7 strain (Table 2). Fourteen motifs relating to 
sAgs and toxic molecules were identified (Fig. | and 
Table 2). Three motifs localized in the S protein were as- 
sociated with neurotoxin, and two motifs identified in the 
E protein were both associated with botulinum neuro- 
toxin, of which the N-terminal exterior one was similar 
to the type D precursor, and the C-terminal interior one 
was similar to the type B precursor. It suggested the pos- 
sible functional relationship of them. One enterotoxin- 
like motif resided near an antigen site at the N-terminus 
of N protein. Moreover, a region of 85 amino acids was 
identified as an unclear antigen. Their antigenicities have 
been validated in recent study (8). In the middle of PUP1, 
two toxin-associated motifs were detected. Almost three- 
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TABLE 2. THE Motirs PREDICTED IN SARS-CoV (BJO1 ISOLATE) 


Motif 
Length Position — Identity length 
Protein (a.a.) (a.a.) (%) (a.a.) Matched sequence 
165 227 36 63 Putative enzymes [Escherichia coli O157:H7] 
193 249 34 57. Bacterial surface antigen family protein [Pseudomonas putida KT2440] 
336 = 3373 36 38  Methionyl-tRNA synthetase 
448 498 33 51. Plaminogen activator, urokinase [Homo sapiens] 
1157 1226 31 70 ~=-P93 antigen 
1196 1244 38 49 Apoptotic protease activating factor-1 long isoform APAF-1L 
1465 1501 43 37. Anti-myosin immunoglobulin heavy chain variable region [Mus musculus] 
R 7073, «1797 =1844 35 48 Interleukin-4 precursor (IL-4) 
2010 2071 34 62 Erythrocyte membrane-associated giant protein antigen 
2933 2960 39 28 Botulinum neurotoxin type F precursor (bont/F) (Bontoxilysin F) 
4441 4461 52 21 _Heat-labile enterotoxin 
5577 5636 31 60 Putative glycosyl transferase [Escherichia coli] 
6475 6503 41 29  Mitogenic exotoxin Z-9 [Streptococcus pyogenes] 
6554 6584 48 31 Superoxide dismutase-blue shark 
6968 7019 31 52  Superantigen ypmc [Yersinia pseudotuberculosis] 
80 107 43 28 Coagulation factor II receptor; Thrombin receptor 
147-173 37 27 Intercellular adhesion molecule 1 precursor; CD54 [Homo sapiens] 
227 249 35 23 Hydrolase, presynaptic neurotoxin molecule: beta2-bungarotoxin 
266 8288 44 23 Intercellular adhesion molecule 1 precursor; CD54 [Homo sapiens] 
S 1255 286 =. 3338 32 53. Botulinum neurotoxin type G precursor (bont/G) (Bontoxilysin G) 
634 653 45 20. Epidermal growth factor [Mus musculus] 
759 789 39 31 Botulinum neurotoxin type G precursor (bont/G) (Bontoxilysin G) 
970 1052 31 83. Peptodoglycan recognition protein-like [Mus musculus] 
1123 1183 34 61 Bullous pemphigoid antigen 1-e [Mus musculus] 
E 6 2 15 43 14 Botulinum neurotoxin type D precursor (Bontoxilysin D) 
47 66 35 20 Botulinum neurotoxin type B precursor (Bontoxilysin B) 
M 721 7 24 39 18 Exfoliative toxin a 
172) (221 37 50. Nitrate-inducible formate dehydrogenase-N alpha subunit [0157:H7] 
2 41 30 40 SLP-76 associated protein Validated 
N 402 45 76 38 32 Staphylococcal enterotoxin a 
68 152 31 85 Nuclear antigen 2 Validated 
275 300 39 26 Antigenic virion protein [Human herpesvirus 6B] 
49 116 34 68 O-antigen polymerase [Shigella boydii] 
PUPI 074 137.147 PP) 11 — Botulinum neurotoxin type G precursor (Bontoxilysin G) 
161 193 34 33 Botulinum neurotoxin type C1 precursor (Bontoxilysin C1) 
214 274 33 61 Transcriptional regulator ume6 
PUP2 154 102. 125 38 24 ee 
120. =150 31 31 Macrophage inflammatory protein 3 alpha 
3 31 47 29 Similar to CD83 antigen [Mus musculus] [Rattus norvegicus] 
20 42 48 23 Tumor necrosis factor alpha; TNF alpha [Mus musculus] 
PUP4 122 47 109 32 63 Lymph node homing receptor (Leukocyte-endothelial cell adhesion 
molecule 1) 
52 63 50 12 Botulinum neurotoxin type F precursor (Bontoxilysin F) 
101 =118 44 18  Cytotoxin 3 precursor (Cardiotoxin analog II) 
PUPS 98 1 24 38 24 Tumor necrosis factor (ligand) superfamily, member 7 [Mus musculus] 
47 98 31 52. Putative GDP-mannose-4,6-dehydratase; Ipsa [Caulobacter crescentus] 


The overlap regions are indicated by bold font. 
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TABLE 3. Hits OF THE SELECTED MotiIFs IN SIX VIRUSES 
SARS-CoV OC43 PEDV HRSV HHerpV HAdenoV 

Neurotoxin 8 17 2, 20 27 29 

Enterotoxin 4 2 8 13 10 13 

IL-1 0 0 0 3 6 1 

IL-2 17 0 0 0) 3 2 

Ig 12 7 1 273 16 18 

TNF 16 0 0 0 35 20 


Including the total hits matching to the local database. 


fourths of the regions in PUP4 were covered by two suc- 
cessive toxin-like motifs near the N-terminal, overlapping 
a consensus sequence of lymph node homing receptor 
(Lnhr). 

2. Motifs associated with cytokines. Eight types of cy- 
tokine-like motifs were detected (Table 2), which were 
associated with IL-2, IL-4, macrophage inflammatory 
factors, plasminogen activator and apoptotic protease ac- 
tivating factor, etc. Two thirds of PUP2 were covered by 
two motifs from the members of cytokines, and about a 
30% region near the N-terminus of PUP5 was covered by 
a TNF-like motif. Identically, one TNF-like motif was 
also found in PUP4. Moreover, the existence of two mo- 
tifs associated with transcription factors and polymerase 
suggested that PUP! might have a relationship with tran- 
scriptional regulation. 

3. Motifs associated with membrane surface molecules. 
Approximately five motifs were involved in the consen- 
sus segments of the surface antigen and receptor of the 
membrane molecules, which were mainly located in the 
structural proteins (Table 2). It should be noted that two 
motifs had a high similarity with intercellular adhesion 
molecule 1 (ICAM-1 near the N-terminal of the S pro- 
tein, CD54), which is important in mediating immune and 
inflammatory responses (2). An Lohr-like motif ac- 
counted for the major part of PUP4. The N protein, which 
is the component of nucleocapsid, contained a nuclear 
antigen-like motif. In the R protein, one region was found 
similar to that of a bacterial surface antigen family pro- 
tein. 

Comparison of the motifs between SARS-CoV and 
five other viruses. We have made a comparative analy- 
sis of five common viruses. Table 3 shows that each type 
of virus has a different composition of the observed mo- 
tifs. Among the three kinds of coronaviruses, the SARS- 
CoV had more confident hits of IL-2 and TNF-motifs than 
the other two. On one hand, SARS-CoV and OC43 had 
more neurotoxin-like motifs than enterotoxin-like ones, 
on the other hand, PEDV showed high hits of enterotoxin- 
like motifs. The HRSV was characterized by high hits of 
Ig-like motifs. In summary, both IL-2 and cytotoxin-like 


motifs appeared in the SARS-CoV. This was different 
from OC43, PEDV and HRSV, but somewhat similar to 
HHerpV and HadenoV. 


DISCUSSION 


In this study, we identified 45 motifs that are similar 
to those known proteins relating to sAgs, toxins, cy- 
tokines and antigens, which have already been confirmed 
in previous studies. The sequences are relatively con- 
served across species, which suggest that similar func- 
tions may be played by these motifs (1). At the same 
time, the results strongly suggest that the distribution of 
the motifs is coincided with the function of the proteins. 
For instance, enzyme-like motifs were located in the R 
protein, adhere molecule-like motifs in the S protein, nu- 
clear antigen-like motif in the N protein, and the motifs 
associated with transcriptional regulator in the PUPI. 
Thus, identification of these motifs provides active evi- 
dence to analyze and understand the functions of relevant 
regions in the proteins of SARS-CoV. 

The whole set of motifs in SARS-CoV is different from 
those of OC43, PEDV, HRSV, HHerpV and HAdenoV. 
Each set of motifs in different viruses can determine dif- 
ferent immune reactions. 

The motifs of antigenicity and toxin play a key role 
in pathogenesis of SARS. Similar to many other RNA 
viruses, the SARS-CoV has some common features, but 
there are some other features of infecting human body 
and causing disease, determined by SARS-CoV’s special 
viral proteins. The motifs, as small conserved region 
within a large biological sequence, are essential to the 
function of the viruses. Both structural and non-structural 
proteins of the SARS-CoV practically have biological ac- 
tivities. It is not difficult to understand what would hap- 
pen while many toxic and antigenic motifs of the active 
proteins are exposed to the host body. 

The data show that the SARS-CoV possesses many 
motifs associated with sAgs, toxins and cytokines. The 
toxic motifs are mainly involved in neurotoxin, entero- 
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toxin, cytotoxin and some proteases. The cytokine-like 
motifs are related to inflammatory factors, apoptosis fac- 
tors and TNF. 

The sAgs are powerful microbial toxins that target the 
host immune system by directly binding the MHC-II and 
T cell receptor (TCR), without requiring the APC pro- 
cessing (4). Since Hillyard first discovered that a group of 
viral sAg (minor lymphocyte stimulating antigens) could 
generate strong T-cell proliferative response, more and 
more evidence suggest that the viral sAgs are involved in 
immune-mediated diseases (14). Unlike normal peptide 
antigens that only stimulate between 0.001% and 0.0001% 
of T cells, many types of sAgs could activate up to 20% 
of all T cells. Even though coming from the same ances- 
tral gene, the same type of sAgs may have variant struc- 
tures, for instance, the staphylococcal and streptococcal en- 
terotoxins, despite they keep the similar virulence (4,9). 

The existence of sAg-like motifs in the SARS-CoV 
provides clear evidence to interpret how the virus causes 
the disease. The sAg and toxic molecules on the PUPs 
and S, E, N proteins could bypass the normal immune 
passway and directly stimulate the T cell. In the case that 
no costimulating signal takes part in the procedure, the 
T cell could not be fully activated and trend to apopto- 
sis, and it would cause markedly decrease of T-cell level 
in the blood, especially CD4* and CD8* cells. Conse- 
quently, it makes the massive inflammatory factors re- 
lease in a short time and induce the autoimmunity (5,10). 
Furthermore, the danger is that the human alveolar cells 
are widely attacked by self-cytokines besides probable 
damage, partly from the virion, followed by fever, mus- 
cle pain, pulmonary edema and allergic angitis. The en- 
terotoxin motifs on the S protein and PUP synthesized 
by SARS-CoV can sometimes lead to diarrhea, but it 
seems to be less serious than PEDV-caused illness in 
swine, which have more enterotoxin motifs (Table 3). 

Unlike HRSV, OC43 and PEDV, the special IL-2 and 
TNF-like motifs of the SARS-CoV must play an exclu- 
sive role in virulence. It is reported that the TNF-a as in- 
flammatory cytokines is associated with bronchial hyper 
responsiveness by reducing the response to --agonists and 
increasing the reactivity to methacholine, the airway neu- 
trophils and alveolar macrophages (9). These motifs lo- 
cated in PUP4 and PUPS might act as the allergens. 

We also focused on PUP4, which contains a toxic mo- 
tif similar to TNF, and other ones similar to endothelial 
cell adhesion molecule. It is postulated that PUP4 might 
positively take part in the interference with the host’s nor- 
mal immune reaction and result in the immune disorder. 

The N protein, as the component of nucleocapsid, has 
an outstanding expression of antigen-associated motifs, 
which implies the important essential of its antigenicity. 

The ICAM-1-like motif in the S protein has a crucial 
role in targeting and invading the host cells with special 
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affinity, which is supposed to mediate the virion adher- 
ing to the alveolar epithelial cells (11), intestines epithe- 
lial cells and brain cells etc. 

The virion compactly integrates multiple motifs may 
play complex roles in making use of the host immune 
system and interfering in the normal immune regulation 
to counterattack the host. This is not dramatic but fac- 
tual, for the construction of viral proteins is so perfect 
and ingenious. 

The signification in the vaccine and drug design, 
diagnosis, and therapy. In summary, the recognition 
of the SARS-CoV’s structure not only help us under- 
stand the molecular basis of its virulence and patho- 
genesis, but also provide us detailed information and 
new approaches to develop more specific diagnostics 
and effective therapeutics against SARS. Based on this 
study, it is essential to avoid the toxin structure and se- 
lect the special antigenic determinant during the vac- 
cine preparation. The accuracy of the molecular probe 
should be modified in order to improve the diagnostic 
preciseness. When we check the cytokines in the blood, 
the potential errors caused by the cross-reaction be- 
tween SARS motifs and antibodies should not be un- 
derestimated. One of the important therapeutic strate- 
gies is to block the viral track against the host immune 
system, so the anti-toxin antibodies should be a good 
selection to treat this disease. 
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